feat: add vLLM ROCm Docker variant for AMD GPUs by dmedovich · Pull Request #917 · docker/model-runner

dmedovich · 2026-05-15T18:10:30Z

Fixes #412

Adds a final-vllm-rocm Docker build target that runs vLLM on AMD GPUs using upstream vLLM (no fork, no third-party wheels), per the direction in the issue.

Approach

Builds upstream vLLM from source on top of rocm/vllm-dev:base — the same base image vLLM's own docker/Dockerfile.rocm uses. That base already ships PyTorch ROCm, Triton ROCm, flash-attention ROCm, and the ROCm SDK pre-compiled by AMD. We just clone vllm@v${VLLM_VERSION} and build the wheel.

The runtime stage mirrors the /opt/vllm-env layout that the existing vLLM backend in pkg/inference/backends/vllm/vllm.go expects (via symlinks, since rocm/vllm-dev:base installs Python deps system-wide).

New targets

make docker-build-vllm-rocm — build the image
make docker-run-vllm-rocm — build + run

LLAMA_SERVER_VARIANT=rocm is passed so the existing logic restricts DOCKER_BUILD_PLATFORMS to linux/amd64.

Trade-offs to flag

The ROCm image does not include llama.cpp. The CUDA vllm stage builds on top of llamacpp and inherits the llama-server binary. For ROCm, rocm/vllm-dev:base and the llama.cpp ROCm image use incompatible ROCm runtime / PyTorch versions, so vllm-rocm is a vLLM-only artifact. Happy to revisit if you'd prefer a combined shape.
Build is heavy. ~30–60 min (HIP kernel compilation), ~12–15 GB final image — likely needs its own CI treatment.
PYTORCH_ROCM_ARCH defaults to gfx90a;gfx942;gfx1100;gfx1101 (MI200/MI300 + RDNA3 7900/7800) — what upstream vLLM officially supports. Exposed as a Dockerfile ARG, overridable at build time.

ttested

✅ go build ./..., go test ./..., go vet, golangci-lint — clean
✅ make help shows the new targets
❌ Did not run make docker-build-vllm-rocm end-to-end. I don't have a supported GPU (only an RX 6600 = gfx1032, not in vLLM's official ROCm targets), so a runtime test wouldn't be informative even if the build succeeded.

Would appreciate a sanity check on the architectural choice (vLLM-only ROCm image vs combined). Happy to adjust.

History

Original commit (76db8d3, force-pushed) installed vLLM via pip from wheels.vllm.ai/${VLLM_VERSION}/rocm63, but that URL doesn't exist — vLLM only publishes CUDA wheels there. Rewrote to source-build approach above.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The vLLM ROCm Docker stage duplicates most of the CUDA vLLM setup; consider factoring the common vLLM/uv/venv installation into a shared intermediate stage or script to keep the two variants in sync more easily.
In the vLLM ROCm Docker stage, you may want to align version configuration with the rest of the image (e.g., via .versions or shared ARGs) so VLLM_ROCM_VERSION and PYTORCH_ROCM_VERSION don’t silently drift from other ROCm-related components.
For the ROCm image’s apt install line, consider adding --no-install-recommends to reduce image size and keep the dependency set tighter, especially since this base is already quite large.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The vLLM ROCm Docker stage duplicates most of the CUDA vLLM setup; consider factoring the common vLLM/uv/venv installation into a shared intermediate stage or script to keep the two variants in sync more easily.
- In the vLLM ROCm Docker stage, you may want to align version configuration with the rest of the image (e.g., via .versions or shared ARGs) so VLLM_ROCM_VERSION and PYTORCH_ROCM_VERSION don’t silently drift from other ROCm-related components.
- For the ROCm image’s apt install line, consider adding --no-install-recommends to reduce image size and keep the dependency set tighter, especially since this base is already quite large.

## Individual Comments

### Comment 1
<location path="Dockerfile" line_range="132-138" />
<code_context>
+
+# Install uv and vLLM with ROCm support as modelrunner user.
+# Uses upstream vLLM ROCm wheels (vLLM gained native ROCm support upstream).
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+    && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+    && . /opt/vllm-env/bin/activate \
+    && ~/.local/bin/uv pip install vllm \
+         --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
+         --extra-index-url https://download.pytorch.org/whl/${PYTORCH_ROCM_VERSION} \
+         --index-strategy unsafe-best-match
+
+RUN /opt/vllm-env/bin/python3.12 -c "import vllm; print(vllm.__version__)" > /opt/vllm-env/version
</code_context>
<issue_to_address>
**issue (bug_risk):** vLLM version arg is not used to pin the installed package, which could desync from the ROCm wheel index path.

`VLLM_VERSION` is only used in the ROCm wheel index URL, while `uv pip install vllm` pulls the latest version from the indexes. If a newer release appears, `uv` may install a different version than the one implied by the ROCm index path, causing resolution errors or mismatched binaries.

Please pin `vllm` to `VLLM_VERSION` so the installed package matches the ROCm wheel index:

```dockerfile
&& ~/.local/bin/uv pip install "vllm==${VLLM_VERSION}" \
     --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
     --extra-index-url https://download.pytorch.org/whl/${PYTORCH_ROCM_VERSION} \
     --index-strategy unsafe-best-match
```
</issue_to_address>

### Comment 2
<location path="Dockerfile" line_range="124" />
<code_context>
+
+USER root
+
+RUN apt update && apt install -y python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
+
+RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env
</code_context>
<issue_to_address>
**suggestion (performance):** Consider using --no-install-recommends to keep the ROCm vLLM image smaller and more deterministic.

You likely only need the explicitly listed packages, so preventing installation of recommended extras will keep this base image lean and more reproducible:

```dockerfile
RUN apt update && apt install -y --no-install-recommends \
    python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential \
    && rm -rf /var/lib/apt/lists/*
```

```suggestion
RUN apt update && apt install -y --no-install-recommends python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

dmedovich · 2026-05-15T18:11:50Z

@ericcurtin yo! look it when you have a free time.

Thanks!

gemini-code-assist

Code Review

This pull request introduces ROCm (AMD GPU) support for vLLM by adding a new Docker build stage and corresponding Makefile targets. While the infrastructure additions are appropriate, several critical correctness issues were identified in the Dockerfile: the vLLM version is invalid, the ROCm 6.3 specification will cause build failures and runtime library mismatches with the base image, and the wheel index URL is missing a required version prefix.

gemini-code-assist · 2026-05-15T18:12:49Z

+# so the resulting image contains the ROCm runtime that vLLM links against.
+FROM llamacpp AS vllm-rocm
+
+ARG VLLM_VERSION=0.19.1


The version 0.19.1 appears to be invalid for vLLM. Upstream vLLM versions are currently in the 0.7.x range. Using an incorrect version will cause the wheel installation to fail as it won't be found on the index. Please verify the intended version (e.g., 0.7.2).

ARG VLLM_VERSION=0.7.2

gemini-code-assist · 2026-05-15T18:12:49Z

+ARG VLLM_ROCM_VERSION=rocm63
+ARG PYTORCH_ROCM_VERSION=rocm6.3


The ROCm versions specified here are problematic for the following reasons:

PyTorch Index 404: PyTorch does not currently host wheels for ROCm 6.3 on their official index. https://download.pytorch.org/whl/rocm6.3 will return a 404 error, causing the build to fail. The latest stable version is rocm6.2.

Base Image Mismatch: The llamacpp base image (resolved from ghcr.io/ggml-org/llama.cpp:server-rocm) is built using ROCm 6.2. Installing ROCm 6.3 wheels on top of a ROCm 6.2 runtime will lead to critical failures at runtime due to incompatible shared library versions (e.g., libamdhip64.so.6).

ARG VLLM_ROCM_VERSION=rocm62 ARG PYTORCH_ROCM_VERSION=rocm6.2

gemini-code-assist · 2026-05-15T18:12:49Z

+    && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+    && . /opt/vllm-env/bin/activate \
+    && ~/.local/bin/uv pip install vllm \
+         --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \


The URL path for vLLM wheels on wheels.vllm.ai typically requires a v prefix before the version number (e.g., https://wheels.vllm.ai/v0.7.2/rocm62). Without this prefix, the URL will likely return a 404 error.

--extra-index-url https://wheels.vllm.ai/v${VLLM_VERSION}/${VLLM_ROCM_VERSION} \

Adds a new Docker build target (final-vllm-rocm) that builds upstream vLLM from source on top of rocm/vllm-dev:base (AMD's pre-built ROCm dev image containing PyTorch ROCm, Triton, flash-attention, and the ROCm SDK). vLLM is checked out at the tagged release matching VLLM_VERSION — no fork, no custom wheels. The runtime stage mirrors the /opt/vllm-env layout that the existing vLLM backend expects via symlinks, so no Go code changes are needed. Unlike the CUDA vllm variant, the ROCm image does not include llama.cpp: the base images are incompatible (different ROCm runtime versions), so the ROCm vllm image is intended as a vLLM-only artifact. Fixes #412

ericcurtin · 2026-05-16T13:46:55Z

Very nice @dmedovich ... You could do the project a huge favour actually... We are lacking people with AMD GPUs, if you could confirm llama.cpp/vLLM works with AMD GPUs that would be great! We are also supposed to automatically detect when your machine has an AMD GPU and just do the right thing in that case, that's another thing, we don't know if it works...

Merging on green build

dmedovich · 2026-05-16T14:42:54Z

@ericcurtin
Checked the auto-detection code — it calls docker.Info() via the Moby client
to look up registered runtimes (nvidia/rocm/musa). I'm running Podman, not
Docker, so I can't test that path rn. Worth noting that Podman users would always
get GPUSupportNone regardless of what GPU they have.

Will install Docker a bit later and check all three points, will come back with detailed info on each.

sourcery-ai Bot reviewed May 15, 2026

View reviewed changes

Comment thread Dockerfile Outdated

Comment thread Dockerfile Outdated

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

dmedovich force-pushed the feat/vllm-rocm-support branch from 76db8d3 to c285e47 Compare May 15, 2026 19:29

ericcurtin approved these changes May 16, 2026

View reviewed changes

ericcurtin merged commit 2b8e451 into docker:main May 16, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add vLLM ROCm Docker variant for AMD GPUs#917

feat: add vLLM ROCm Docker variant for AMD GPUs#917
ericcurtin merged 1 commit into
docker:mainfrom
dmedovich:feat/vllm-rocm-support

dmedovich commented May 15, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

dmedovich commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

ericcurtin commented May 16, 2026

Uh oh!

Uh oh!

dmedovich commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ARG VLLM_ROCM_VERSION=rocm63
		ARG PYTORCH_ROCM_VERSION=rocm6.3

Conversation

dmedovich commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

New targets

Trade-offs to flag

ttested

History

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dmedovich commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ericcurtin commented May 16, 2026

Uh oh!

Uh oh!

dmedovich commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dmedovich commented May 15, 2026 •

edited

Loading